Parallel BLAST on split databases
نویسنده
چکیده
SUMMARY BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. AVAILABILITY Source code is available from ftp://saf.bio.caltech.edu/
منابع مشابه
paraBLAST: A Highly Scalable Parallelized BLAST Solution
Programs of the NCBI BLAST family have been widely used for retrieving homologous sequences from existing databases. This article briefly introduces and evaluates a parallelized version of the BLAST algorithm, paraBLAST, using Message Passing Interface (MPI) on a multi-node compute cluster. A dynamical database fragmentation scheme based on the availability of a compute cluster is proposed. Its...
متن کاملTurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub
BLAST (Basic Local Alignment Search Tool) is by far the most widely used application for rapid screening of large sequence databases. This paper describes TurboBLAST, a parallel implementation of BLAST suitable for execution on networked clusters of heterogeneous PCs, workstations, or Macintosh computers.
متن کاملBGBlast: A BLAST Grid Implementation with Database Self-Updating and Adaptive Replication
BLAST is probably the most used application in bioinformatics teams. BLAST complexity tends to be a concern when the query sequence sets and reference databases are large. Here we present BGBlast: an approach for handling the computational complexity of large BLAST executions by porting BLAST to the Grid platform, leveraging the power of the thousands of CPUs which compose the EGEE infrastructu...
متن کاملA Local Sequence Alignment Algorithm Using an Associative Model of Parallel Computation
Local sequence alignment is widely used to discover structural and hence, functional similarities between biological sequences. While the faster heuristic methods like BLAST and FASTA are useful to compare a single sequence to hundreds or even thousands of sequences in genetic databases such as GenBank, EMBL, and DDBJ, this work yields pairwise alignments with a high sensitivity. The heuristic ...
متن کاملPARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology
PARALIGN is a rapid and sensitive similarity search tool for the identification of distantly related sequences in both nucleotide and amino acid sequence databases. Two algorithms are implemented, accelerated Smith-Waterman and ParAlign. The ParAlign algorithm is similar to Smith-Waterman in sensitivity, while as quick as BLAST for protein searches. A form of parallel computing technology known...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 19 14 شماره
صفحات -
تاریخ انتشار 2003